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Abstract 

In  this  paper,  we  focus  on  automated  addition  of  fault-tolerance  to  an  existing  fault-intolerant  real¬ 
time  program.  We  consider  three  levels  of  fault-tolerance,  failsafe,  nonmasking,  and  masking,  based  on 
the  properties  satisfied  in  the  presence  of  faults.  Furthermore,  for  failsafe  and  masking  fault-tolerance,  we 
introduce  two  cases,  soft  and  hard,  based  on  satisfaction  of  timing  constraints  in  the  presence  of  faults.  We 
present  sound  and  complete  algorithms  with  polynomial  time  complexity  in  the  size  of  region  graphs  for 
the  case  where  soft-failsafe,  nonmasking,  and  soft-masking  fault-tolerance  is  added  to  an  existing  real-time 
program.  Furthermore,  we  propose  a  sound  and  complete  algorithm  with  polynomial  time  complexity  in 
the  size  of  region  graphs  for  adding  hard-failsafe  fault-tolerance,  where  the  synthesized  program  is  required 
to  satisfy  at  most  one  bounded  response  property  in  the  presence  of  faults.  Moreover,  we  show  that  the 
problem  of  adding  hard  masking  fault-tolerance,  where  the  synthesized  fault-tolerant  program  is  required  to 
satisfy  multiple  bounded  response  properties  in  the  presence  of  faults,  is  NP-hard  in  the  size  of  the  region 
graph.  Thus,  this  work  characterizes  classes  of  problems  where  adding  fault-tolerance  to  real-time  programs 
is  expected  to  be  feasible  and  where  the  complexity  is  too  high. 
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Formal  methods. 

1  Introduction 

Fault-tolerance  and  real-time  properties  are  crucial  assurance  requirements  in  many  computing  systems.  How¬ 
ever,  since  fault-tolerance  and  real-time  properties  often  impose  conflicting  constraints  on  systems,  they  are 
not  easy  to  combine.  Meeting  real-time  properties  needs  predictability  and  fault-tolerance  requires  programs 
to  continue  to  function  even  in  the  presence  of  unanticipated  faults.  In  other  words,  while  satisfaction  of  tim¬ 
ing  constraints  requires  a  priori  knowledge  of  the  system's  temporal  operation,  fault-tolerance  is  built  on  the 
principle  that  faults  occur  unexpectedly  and  that  faults  must  be  handled  through  some  recovery  mechanism. 
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Automated  program  synthesis  is  the  problem  of  designing  an  algorithmic  method  to  find  a  program  that 
satisfies  a  required  set  of  properties.  Such  automated  synthesis  is  desirable,  as  it  ensures  that  the  synthesized 
program  is  correct  by  construction  even  if  its  required  set  of  properties  have  conflicting  constraints  such  as 
fault-tolerance  and  real-time.  In  the  existing  specification-based  synthesis  methods,  a  change  in  the  specifica¬ 
tion  requires  us  to  synthesize  from  scratch.  Thus,  it  would  be  advantageous,  if  we  could  reuse  the  previous 
efforts  made  to  synthesize  fault-intolerant  real-time  programs  and  somehow  add  fault-tolerance  to  them.  More¬ 
over,  such  addition  is  especially  useful  if  the  fault-intolerant  real-time  program  is  designed  manually,  e.g.,  for 
ensuring  that  the  original  program  is  efficient. 

With  this  motivation,  in  this  paper,  we  focus  on  designing  synthesis  algorithms  that  solely  add  fault- 
tolerance  to  real-time  programs  .  Such  synthesis  methods  arc  desirable,  as  it  may  not  be  possible  to  anticipate 
all  faults  that  a  program  may  be  subject  to,  at  design  time.  Our  goal  in  this  work  is  to  concentrate  on  algo¬ 
rithms  with  manageable  time  and  space  complexity,  i.e.,  complexity  that  is  comparable  to  the  corresponding 
complexity  of  existing  model  checking  techniques  for  fault-tolerant  programs  in  dense  real-time  model. 

Regarding  fault-tolerance,  we  consider  three  levels,  based  on  the  properties  satisfied  in  the  presence  of 
faults.  Intuitively,  a  failsafe  fault-tolerant  program  does  not  violate  its  safety  specification  even  in  the  presence 
of  faults,  i.e.,  a  bad  thing  does  not  occur  when  the  program  is  running  in  the  presence  of  faults.  A  nonmasking 
program  ensures  recovery  to  its  normal  behavior  after  the  occurrence  of  faults.  A  masking  fault-tolerant  pro¬ 
gram  has  both  properties,  i.e.,  in  the  presence  of  faults,  it  does  not  violate  its  safety  specification  while  ensuring 
recovery  to  its  normal  behavior.  Regarding  real-time,  we  propose  two  cases,  soft  and  hard ,  based  on  satisfaction 
of  timing  constraints  in  the  presence  of  faults  (cf.  Section  3  for  examples). 

1.1  Related  Work 

In  real-time  computing  literature,  fault-tolerance  has  mostly  been  addressed  in  the  context  of  scheduling  al¬ 
gorithms  (e.g.,  [1-5]).  In  fault-tolerant  real-time  scheduling,  the  objective  is  to  find  the  optimal  schedule  of 
a  set  of  tasks  on  a  set  of  processors  dynamically,  such  that  the  largest  possible  set  of  tasks  meet  their  dead¬ 
lines.  Since  time  complexity  is  a  critical  issue  in  dynamic  scheduling,  most  of  the  proposed  algorithms  arc 
in  the  form  of  heuristics  designed  for  specific  platforms  or  architectures  and  for  a  special  type  of  faults  (e.g., 
transient,  fail-stop,  Byzantine,  etc.). 

The  problem  of  synthesizing  untimed  fault-tolerant  programs  has  been  studied  in  the  literature  from  differ¬ 
ent  perspectives.  In  [6-10],  the  authors  propose  synthesis  methods,  heuristics,  and  enhancement  algorithms  for 
adding  fault-tolerance  and  multitolerance  to  existing  programs  in  the  high  (respectively,  low)  atomicity  model, 
where  processes  can  (respectively,  cannot)  read  and  write  all  the  program  variables  in  one  atomic  step.  In  [1 1], 
Attie,  Arora,  and  Emerson  study  the  problem  of  synthesizing  fault-tolerant  concurrent  untimed  programs  from 
temporal  logic  specification  expressed  in  Ctl  formulas. 

Synthesis  of  real-time  systems  has  mostly  been  studied  in  the  context  of  timed  automata  from  a  game- 
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theoretic  perspective  [12-19].  In  these  papers,  the  common  assumption  is  that  the  existing  program  (called  a 
plant)  and/or  the  given  specification  is  deterministic.  Moreover,  since  the  authors  of  the  aforementioned  work 
consider  highly  expressive  specifications,  the  complexity  of  proposed  methods  arc  also  very  high.  For  example, 
algorithms  presented  in  [12-15, 18, 19]  arc  ExPTlME-complete.  Moreover,  deciding  the  existence  of  a  solution 
(called  a  controller)  in  [16, 17]  is  2ExPTlME-complete. 

Online  fault  detection  in  a  given  dense-timed  automaton  is  studied  by  Tripakis  in  [20].  The  author  proposes 
a  polynomial  space  online  algorithm  to  design  a  diagnoser  that  detects  faults  in  the  behavior  of  the  given 
timed  automaton  after  they  occur.  In  this  modeling,  it  is  assumed  that  (1)  the  given  system  is  in  synchronous 
model,  and  (2)  faults  and  errors  arc  the  same  thing.  Bouyer,  Chevalier,  and  D’ Souza  [21]  address  the  same 
problem  where  the  diagnoser  is  supposed  to  be  realizable  as  a  deterministic  timed  automaton  or  an  event  record 
automaton. 

1.2  Contributions 

The  point  of  departure  of  our  work  from  the  above  related  work  is  as  follows.  In  this  paper  we  (i)  consider 
a  generic  fault-tolerance  framework  for  real-time  programs  independent  of  platform,  architecture,  and  type  of 
faults;  (ii)  extend  the  previous  work  by  Kulkarni  and  Arora  [6]  for  adding  fault-tolerance  to  untimed  programs; 
(iii)  consider  a  general  notion  of  real-time  programs  that  covers  both  deterministic  and  nondeterministic  pro¬ 
grams  in  both  synchronous  and  asynchronous  models;  and  (iv)  consider  different  levels  of  fault-tolerance  for 
real-time  systems  based  on  satisfaction  of  properties  and  timing  constraints.  Furthermore,  we  present  a  class  of 
specifications  where  we  can  express  typical  requirements  for  specifying  real-time  and  fault-tolerant  computing 
systems  and  we  show  that  the  complexity  of  synthesis  algorithms  for  this  class  of  specifications  is  manageable 
in  the  sense  that  they  arc  comparable  to  existing  model  checking  techniques  for  real-time  programs  [22].  The 
main  results  in  this  paper  arc  as  follows: 

1 .  We  propose  a  generic  formal  framework  that  defines  the  notions  of  faults  and  levels  fault-tolerance  in  the 
context  of  real-time  programs. 

2.  We  present  polynomial  time  (in  the  size  of  the  region  graphs)  sound  and  complete  algorithms  that  trans¬ 
form  fault-intolerant  real-time  programs  into 

(a)  soft-failsafe,  nonmasking,  and  soft-masking  fault-tolerant  programs,  and 

(b)  hard-failsafe  fault-tolerant  programs,  where  the  synthesized  fault-tolerant  program  is  required  to 
satisfy  at  most  one  bounded  response  property  in  the  presence  of  faults. 

3.  We  present  a  sound  polynomial  time  algorithm  that  transforms  a  fault-intolerant  real-time  program  into  a 
hai'd  masking  fault-tolerant  program,  where  the  synthesized  fault-tolerant  program  is  required  to  satisfy 
at  most  one  bounded  response  property  in  the  presence  of  faults. 

4.  We  note  that  the  problem  of  adding  hai'd  masking  fault-tolerance,  where  the  synthesized  program  is 
required  to  satisfy  multiple  bounded  response  properties  in  the  presence  of  faults,  is  NP -hai'd. 
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Organization  of  the  paper.  In  Section  2,  we  present  the  preliminary  concepts.  We  formally  define  the 
notions  of  faults  and  fault-tolerance  in  the  context  of  real-time  programs  in  Section  3.  In  Section  4,  we  formally 
state  the  problem  of  adding  fault-tolerance  to  real-time  programs.  We  present  our  transformation  algorithms 
and  NP-hardness  result  in  Section  5.  Then,  in  Section  6,  we  answer  the  potential  questions  raised  about  our 
approach.  Finally,  in  Section  7,  we  make  the  concluding  remarks  and  discuss  future  work. 

2  Preliminaries 

In  this  section,  we  present  the  preliminary  concepts  and  formal  definitions  of  real-time  programs,  specifications, 
and  region  graphs.  Programs  arc  specified  in  terms  of  their  state  space  and  their  transitions  [23].  The  definition 
of  specifications  is  adapted  from  Henzinger  [24],  Finally,  the  notion  of  region  graph  is  due  to  Alur  and  Dill  [25]. 

2.1  Program 

A  program  includes  a  finite  set  V  of  discrete  variables  and  a  finite  set  X  of  clock  variables.  Each  discrete 
variable  is  associated  with  a  finite  domain  D  of  values.  A  location  is  a  function  that  maps  each  discrete  variable 
to  a  value  from  its  respective  domain.  For  the  set  X  of  clock  variables  ,  the  set  $(X)  of  clock  constraints  p  is 
inductively  defined  by  the  grammar:  tp  ::=  x  <  c  |  x  >  c  \  x  <  c  \  x  >  c  \  p  A  p,  where  x  €  X  and  c  G  Z>0. 
A  clock  valuation  is  a  function  v  :  X  — ►  K>0  that  assigns  a  real  value  to  each  clock  variable.  Furthermore,  for 
r  £  M>o,  v  +  r  =  u(x)  +  r  for  every  clock  x.  Also,  for  A  C  X,  u\\  :=  0]  denotes  the  clock  valuation  for  X 
which  assigns  0  to  each  x  £  A  and  agrees  with  v  over  the  rest  of  the  clock  variables  in  X. 

A  state  of  a  program  (denoted  a)  is  a  pair  (s,v),  such  that  s  is  a  location  and  v  is  a  clock  valuation 
for  X  at  location  s.  Since  the  domain  of  clock  variables  ranges  over  the  real  numbers,  the  state  space  of  a 
program  (the  set  of  all  possible  states)  is  infinite.  A  transition  of  a  program  (denoted  (a o,  o\ ))  is  of  the  form 
(so,  1Jo )  —■ ►  (A  i  ■  o\ ).  Transitions  arc  classified  into  two  types: 

•  Delay  (elapse  of  time):  for  a  state  cr  =  (s,  v)  and  a  time  duration  5  £  M>o  (denoted  (a,  5)),  (s,  v)  — > 
(s,  v  +  5). 

•  Jump  (location  switch):  for  a  state  (sq,v),  a  location  si,  and  a  set  A  of  clock  variables,  (s o,u)  —> 
(si,^[A  :=  0]). 

We  say  a  state  ay  is  passed  by  the  delay  (a o,  S)  if  a\  =  ero  +  e  for  some  e  £  M>o  such  that  e  <  6. 

A  program  V  is  a  tuple  (Sp,  r/> p ),  where  Sp  is  the  state  space,  and  'ipp  is  a  set  of  transitions.  Let  'ipp  and  rip 
denote  the  set  of  jump  and  delay  transitions  in  rip,  respectively.  A  state  predicate  is  a  Boolean  expression  over 
the  variables  of  V.  Note  that,  in  such  an  expression,  a  clock  constraint  must  be  picked  from  4> (X).  i.e.,  clock 
variables  can  only  be  compared  with  nonnegative  integers.  A  state  predicate  can  also  be  expressed  as  a  subset  of 
Sp  such  that  it  is  definable  by  the  above  syntax  of  clock  constraints.  A  state  predicate  S  is  closed  in  the  program 
V  iff  ((V(<7o,0i)  G  ipp  :  (ero  G  S  =>  ay  G  S))  A  (V(a,S)  £  :  (a  £  S  =>  Ve  <  5  :  a  +  e  £  S))),  i.e.,  if  a 

jump  transition  originates  in  S  then  it  must  terminate  in  S,  and  if  a  delay  transition  originates  in  a  state  in  S  then 
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any  state  passed  by  the  delay  plus  the  target  state  must  be  in  S.  A  timed  state  sequence  ((do,  To),  (<7i,  t\)  ■  ■  ■), 
where  r,  G  M>o,  is  a  computation  of  V  iff  the  following  conditions  arc  satisfied:  (1)  Vj  >  0  :  (a,  _] .  Oj)  G  ipp, 
(2)  if  it  is  finite  and  terminates  in  (07,  t;)  then  there  does  not  exist  state  a  such  that  (a;,  o)  G  ipv,  and  (3)  the 
sequence  (to,  t\  ■  ■  ■ )  satisfies  the  following  constraints: 

Monotonicity.  Ti  <  r^+i  for  all  i  G  N. 

Divergence :  For  all  t  G  R>0.  there  exists  j  such  that  t3  >  t. 

The  projection  of  a  set  of  program  transitions  ipp  on  state  predicate  S  (denoted  ipp\S)  is  the  set  of  transitions 
{(<To,  di)  I  (do,  di)  G  pjp  A  do,  di  G  5}u  {(d,  5)  I  (d,  (5)  G  V’p  A  d  G  S  A  (Ve  <  <5  :  d  +  e  G  5)}.  I.e., 
'ipplS  consists  of  jump  transitions  of  ipp  that  start  in  S  and  end  in  S,  and  delay  transitions  of  ipp  that  start  and 
remain  in  S  continuously. 

2.2  Specification 

A  specification  (or  property),  denoted  E,  is  a  set  of  timed  state  sequences  of  the  form  ((do,  To),  (di,  t\)  ■  ■  ■). 
Following  Henzinger  [24],  we  require  the  sequence  (to,ti  •  •  •)  to  satisfy  monotonicity  and  divergence.  We 
now  define  what  it  means  for  a  program  V  to  satisfy  a  specification  E.  Given  a  program  V,  a  state  predicate  S, 
and  a  specification  E,  we  write  V  |=s  E  and  say  that  program  V  satisfies  Tifrom  S  iff  (1)  S  is  closed  in  V,  and 
(2)  every  computation  of  V  that  starts  where  S  is  true  is  in  E.  If'P  \=s  E  and  S  {},  we  say  S  is  an  invariant 
of  V  for  E. 

Notation.  Whenever  the  specification  is  clear  from  the  context,  we  will  omit  it;  thus,  “5  is  an  invariant  of  V” 
abbreviates  “S  is  an  invariant  of  V  for  E”. 

We  say  that  program  V  maintains  E  iff  for  all  finite  timed  state  sequences  a  of  V,  there  exists  a  timed  state 
sequence  (3  such  that  api  G  E.  Similarly,  we  say  that  V  violates  E  iff  it  is  not  the  case  that  V  maintains  E.  Note 
that,  the  definition  of  maintains  identities  the  property  of  finite  timed  state  sequences,  whereas  the  definition  of 
satisfies  expresses  the  property  of  infinite  timed  state  sequences. 

Following  Alpern  and  Schneider  [26]  and  Henzinger  [24],  we  let  the  specification  consist  of  a  liveness  spec¬ 
ification  and  a  safety  specification.  The  liveness  specification  is  represented  by  a  set  of  infinite  computations. 
A  program  satisfies  the  liveness  specification,  if  every  computation  prefix  of  the  program  has  a  suffix  that  is  in 
the  liveness  specification. 

Remark  2.1:  In  the  synthesis  problem,  we  begin  with  an  initial  fault-intolerant  program  that  satisfies  its 
specification  (including  the  liveness  specification)  in  the  absence  of  faults.  In  Section  5,  we  show  that  our 
synthesis  algorithms  preserve  liveness  specification.  Hence,  the  liveness  specification  need  not  be  specified 
explicitly. 

Regarding  safety,  in  synthesis  algorithms  presented  in  this  paper,  we  let  the  safety  specification  con¬ 
sist  of  (1)  a  set  E/,/  of  bad  transitions  that  should  not  occur  in  the  program  computation,  i.e.,  a  subset  of 
{(<7o,  ti)  j  cto,cti  G  Sp},  and  (2)  a  conjunction  of  zero  or  more  bounded  response  properties  of  the  form 
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Tjbr  =  ((Pi  Qi)  A  (P2  ^<«52  Q2 )  a  ...  A  (Pm  I ~^<sm  Qm )),  i-e.,  it  is  always  the  case  that  a  state  in 

Pi  is  followed  by  a  state  in  Q,  within  5,  time  units,  where  P%  and  Q,  arc  state  predicates  and  A,  £  Z>o,  for  all 
i  such  that  1  <  i  <  m.  Observe  that  it  is  possible  to  trivially  translate  this  concise  representation  of  safety  into 
the  corresponding  set  of  infinite  computations.  The  same  concept  is  applicable  to  definitions  of  maintains  and 
violates. 

2.3  Region  Graph 

Given  a  program  P(Sp,  Rp),  in  order  to  reason  about  properties  of  V ,  one  must  deal  with  the  infinite  state  space 
Sp.  Alur  and  Dill  [25]  propose  construction  of  a  finite  quotient  as  a  solution  for  dealing  with  the  infinite  state 
space.  This  construction  uses  an  equivalence  relation,  called  region  equivalence  (denoted  =),  on  the  state  space 
that  equates  two  states  with  the  same  location,  is  defined  over  the  set  of  all  clock  valuations  for  X.  For  two 
clock  valuations  v  and  //,  v  =  //  iff: 

1.  Vx  6  I  :  ((Lzy(x)J  =  \p(x)l)  v  >  cx)), 

2 .  VxjGl  :  {{y{x)<cx  A  v{y)  <  cy))  :  {{v(x))  <  (v(y))  iff  (p(x))  <  (p(y))),  and 

3.  Vx  £  X  :  u(x)  <  cx  :  {{v{x))  =  0  iff  (p(x))  =  0), 

where  cx  is  the  largest  integer  c,  such  that  x  is  compared  with  c  in  a  clock  constraint,  (r)  denotes  the  fractional 

part,  and  r  j  denotes  the  integral  paid  of  r  and  for  any  r  £  M>o-  A  clock  region  for  V  is  an  equivalence  class 
of  clock  valuations  induced  by  =.  Note  that,  there  arc  only  finite  number  of  clock  regions. 

A  region  is  a  pair  ( s,p ),  where  s  is  a  location  and  p  is  a  clock  region.  Using  the  region  equivalence 
relation,  we  construct  the  region  graph  of  V{Sp:ipp)  (denoted  Ili'P) (S^.  ip^})  as  follows.  Vertices  of  R('P) 
(denoted  Sp)  arc  regions.  Edges  of  Il(P)  (denoted  ipp)  arc  of  the  form  (sq,  Po)  (si,  pi)  iff  for  some  clock 
valuations  vq  £  po  and  u\  £  pi,  (so;^o)  — >•  (si,^i)  is  a  transitions  in  We  say  that  a  region  (so,po)  of 
region  graph  R(P)  is  a  deadlock  region  iff  for  all  regions  (si,pi),  there  does  not  exist  an  edge  of  the  form 
(so,Po)  — ^  (si.Pi). 

A  region  predicate  Sr  with  respect  to  a  state  predicate  S  is  defined  by  Sr  =  {(s,  p)  |  3(s,  v)  :  ((s.  u)  £ 

S  A  v  £  p)}.  Likewise,  the  region  predicate  with  respect  to  invariant  S  of  a  program  P  is  called  region 

invariant  Sr .  The  projection  of  a  set  of  edges  Rp  on  region  predicate  ,S"'  (denoted  Rp\Sr)  is  the  set  of  edges 

{(r0,ri)  |  (r0,?’i)  £  iprp  A  r0,ri  £  Sr}. 

Based  on  the  above  description  to  construct  a  region  graph,  in  our  synthesis  algorithms  in  Section  5,  we 
transform  a  real-time  program  P(Sp,  Rp)  into  its  corresponding  region  graph  R(P) (Sp,  'ipp)  by  invoking  the 
subroutine  ConstructRegionGraph.  We  also  let  this  subroutine  take  state  predicates  and  sets  of  transitions  in 
P  (e.g.,  S  and  S^)  and  return  the  corresponding  regions  predicates  and  sets  of  edges  in  R(P)  (e.g.,  Sr  and 

sy- 

A  clock  region  (3  is  a  time-successor  of  a  clock  region  a  iff  for  each  u  £  o:,  there  exists  r  £  M>o,  such  that 
v  +  r  £  (3,  and  u  +  r'  £  a  U  6  lor  all  t'  <  r.  We  call  a  region  (s,  p)  a  boundary  region,  if  for  each  u  £  p 
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and  for  any  r  £  M>o,  v  and  u  +  r  arc  not  equivalent.  A  region  is  open ,  if  it  is  not  a  boundary  region.  A  region 
(s,  p)  is  called  an  end  region,  if  for  all  v  G  p  and  for  all  clocks  x,  lAx)  >  cx. 

3  Faults  and  Fault-Tolerance  in  Real-Time  Programs 

In  this  section,  we  extend  formal  definitions  of  faults  and  fault-tolerance  due  to  Arora  and  Gouda  [27]  and 
Arora  and  Kulkarni  [28],  so  that  they  fit  in  the  context  of  real-time  programs. 

The  faults  that  a  program  is  subject  to  are  systematically  represented  by  transitions.  A  class  of  faults  f  for 
program  V(SP,  fp)  is  a  subset  of  the  set  Sp  x  Sp.  We  use  ipp\\f  to  denote  the  transitions  obtained  by  taking  the 
union  of  the  transitions  in  wp  and  the  transitions  in  /. 

We  say  that  a  state  predicate  T  is  an  /-span  (read  as  fault-span)  of  V  from  S  iff  the  following  conditions 
arc  satisfied:  (1)  SCT,  and  (2)  T  is  closed  in  ipp\\  f.  Observe  that  for  all  computations  of  V  that  start  at  states 
where  S  is  true,  T  is  a  boundary  in  the  state  space  of  V  up  to  which  (but  not  beyond  which)  the  state  of  V  may 
be  perturbed  by  the  occurrence  of  the  transitions  in  /.  Similar  to  the  notion  of  region  invariant  (cf.  Subsection 
2.3),  the  region  predicate  with  respect  to  fault-span  T  of  a  program  V  is  called  region  fault-span  Tr .  Likewise, 
f  r  denotes  the  set  of  faults  edges  in  R(V)  that  correspond  to  fault  transitions  /  in  V. 

As  we  defined  the  computations  of  V,  we  say  that  a  timed  state  sequence,  ((<to,to),  •  •  • ),  is  a 

computation  ofV  in  the  presence  of  f  iff  the  following  four  conditions  arc  satisfied:  (1)  Vj  >  0  :  (oj.  i ,  of)  £ 
{ipp  U  /),  (2)  if  (<to,  cr\.  -  ■  ■)  is  finite  and  terminates  in  state  ( 07, 77 )  then  there  does  not  exist  state  a  such  that 
(<r  1 ,  <r)eipp,  (3)  (r0,  Ti  ■  ■  ■ )  satisfies  monotonicity  and  divergence,  and  (4)  3n  >  0  :  (Vj  >  n  :  (<Tj-i,  ay)  £'0p). 

In  this  paper,  we  consider  three  levels  of  fault- tolerance,  failsafe,  nonmasking,  and  masking,  based  on  the 
properties  satisfied  in  the  presence  of  faults.  For  failsafe  and  masking  fault- tolerance,  we  propose  two  cases, 
soft  and  hard,  based  on  satisfaction  of  timing  constraints  in  the  presence  of  faults.  To  motivate  the  idea  of  soft 
and  hai'd  fault-tolerance  let  us  consider  the  railroad  crossing  problem.  Suppose  that  a  train  is  approaching  a 
railroad  crossing.  The  safety  specification  requires  “if  the  train  is  crossing,  the  gate  should  be  closed Also, 
the  bounded  response  property  requires  that  “ once  the  gate  is  closed,  it  should  reopen  within  5  minutes”.  In 
this  example,  it  may  be  catastrophic  if  the  train  is  crossing  while  the  gate  is  open  due  to  occurrence  of  faults. 
On  the  other  hand,  if  the  gate  remains  closed  for  more  than  5  minutes  due  to  occurrence  of  faults,  the  outcome 
is  not  disastrous.  Thus,  depending  upon  the  outcome  of  violation  of  a  safety  specification,  the  desired  fault- 
tolerance  requirement  changes.  Hence,  in  the  railroad  crossing  problem  the  desired  requirement  is  the  system 
must  tolerate  faults  that  cause  the  gate  to  be  open  while  the  train  is  crossing  and,  hence,  this  system  must  be  soft 
fault-tolerant.  Intuitively,  a  soft  fault-tolerant  real-time  program  is  not  required  to  satisfy  its  timing  constraints 
in  the  presence  of  faults. 

Now,  consider  a  system  that  controls  internal  pressure  of  a  boiler.  Suppose  that  in  this  system,  the  safety 
specification  requires  that  once  a  pressure  gauge  reads  30  pounds  per  square  inch,  the  controller  must  issue  a 
command  to  open  a  valve  within  20  seconds.  In  such  a  system,  if  occurrence  of  faults  causes  the  controller  not 
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to  respond  within  the  required  time,  the  outcome  may  be  disastrous.  Thus,  our  boiler  controller  must  satisfy 
its  timing  constraints  even  in  the  presence  of  faults.  In  other  words,  the  boiler  controller  must  be  hard  fault- 
tolerant.  Intuitively,  a  hard  fault-tolerant  real-time  program  must  satisfy  its  timing  constraints  in  the  presence 
of  faults. 

We  now  present  formal  definitions  of  different  levels  of  fault-tolerance.  Let  specification  £  consist  of  £  ),L 
and  £/„..  We  say  that  V  is  soft-failsafe  f  -tolerant  from  S  for  £  iff  the  following  conditions  hold:  (1)  V  |=  S  ^bu 
(2)  V  |=s  £fer,  and  (3)  there  exists  T  such  that  T  is  an  /-span  of  V  from  S,  and  V(SP,  f>P\\f)  maintains  £^ 
from  T.  A  program  V  is  hard-failsafe  f  -tolerant  from  S  for  £  iff  V  is  soft-failsafe  /-tolerant  from  S  for  £  and 
V(Sp,  fp\\f)  maintains  £f,r  from  T. 

Since  a  nonmasking  fault-tolerant  program  need  not  satisfy  safety  in  the  presence  of  faults,  V  is  nonmasking 
f  -tolerant  from  S  for  £  with  recovery  time  5,  where  5  €  Z>o,  iff  the  following  conditions  hold:  (1)  P  \=s  £ bt , 
(2)  V  |=5  £;„.,  and  (3)  there  exists  T  such  that  T  is  an  /-span  of  V  from  S,  and  every  computation  of 
V(SP ,  'Wp\\  f)  that  starts  from  a  state  in  T,  reaches  a  state  in  S  within  6  time  units. 

A  program  V(SP,  fp)  is  soft-masking  f  -tolerant from  S  for  £  with  recovery  time  5,  where  6  G  Z>o,  iff  the 
following  conditions  hold:  (1)  V  |=5  £^,  (2)  V  |— 5  £for,  (3)  there  exists  T  such  that  T  is  an  /-span  of  V  from 
S  and  V(Sp ,  fp [] /)  maintains  £/,/  from  T,  and  (4)  every  computation  of  'P(Sp.  Pp [] /)  that  starts  from  a  state 
in  T,  reaches  a  state  in  S  within  6  time  units.  A  program  P(Sp,  'fp)  is  hard-masking  f -tolerant  from  S  for  £ 
with  recovery  time  6,  where  <5  E  Z>q,  iff  V  is  soft-masking  /-tolerant  from  S  for  £  with  recovery  time  5,  and 
V(SP,  fp[]f)  maintains  £&r  from  T. 

Notation.  Whenever  the  specification  £  and  the  invariant  S  are  clear  from  the  context,  we  omit  them;  thus, 
“/-tolerant”  abbreviates  “/-tolerant  from  S  for  £”. 

Assumption  3.1:  Since  the  program  V  satisfies  £/,r  =  ((Pi  Q 1)  A  (P2  ^<s2  Qo)  A  ...  A  (Pm  1 — > <5r„ 

Qrn ) )  in  the  absence  of  faults  (cf.  Remark  2.1),  without  loss  of  generality,  we  assume  that  for  each  bounded 
response  property  (P,  ^-><8,  Qi),  where  1  <  i  <  m,  the  intolerant  program  already  has  a  clock  variable  that  is 
reset  on  transitions  that  go  from  a  state  in  =Pj  to  a  state  in  P,  .  This  assumption  simplifies  dealing  with  the  given 
bounded  response  property,  as  we  are  ensured  that  the  program  itself  keeps  track  of  time  when  P,  becomes  true. 
In  case  such  a  clock  does  not  exist,  we  can  simply  add  it  without  changing  semantics  of  the  given  program. 
Assumption  3.2:  We  assume  that  faults  are  immediately  detectable  and  that  given  a  state  of  the  program,  we 
can  determine  the  number  of  faults  that  have  occurred  in  reaching  that  state.  (For  example,  one  can  achieve 
this  if  the  program  has  a  variable  that  stores  how  many  faults  have  occurred  in  a  program  computation.)  This 
assumption  is  needed  only  for  hard  fault-tolerance. 

Assumption  3.3:  We  assume  that  the  number  of  occurrence  of  faults  in  a  program  computation  is  bounded 
by  a  pre-specified  value  n.  This  assumption  is  required  since  for  commonly  considered  faults,  it  can  be  shown 
that  bounded-time  recovery  in  the  presence  of  unbounded  occurrence  of  faults  is  impossible. 


4  Problem  Statement 

Given  arc  a  fault-intolerant  real-time  program  'P(Sp,  ipp),  its  invariant  S,  a  set  of  faults  /,  and  a  safety  specifi¬ 
cation  X  such  that  V  |=,s'  X.  Our  goal  is  to  synthesize  a  real-time  program  V'(SP,  ip'p)  with  invariant  S'  such 
that  V  is  /-tolerant  from  S'  for  X. 

As  mentioned  in  the  introduction,  our  synthesis  method  obtains  V'  from  V  by  adding  fault-tolerance  alone 
to  V,  i.e.,  V'  does  not  introduce  new  behaviors  to  V  when  no  faults  have  occurred.  We  now  describe  how  we 
formulate  the  problem.  Observe  that: 

1.  If  S'  contains  states  that  arc  not  in  S  then,  in  the  absence  of  faults,  V  may  include  computations  that 
start  outside  S.  Since  V'  [=5/  X,  it  would  imply  that  V'  is  using  a  new  way  to  satisfy  X  in  the  absence  of 
faults  (since  V  satisfies  X  only  from  S ).  Therefore,  we  require  that  S'  C  S. 

2.  If  'ip'p | S'  contains  a  transition  that  is  not  in  ipp \ S'  then  V'  can  use  this  transition  in  order  to  satisfy  X  in 
the  absence  of  faults.  Since  this  was  not  permitted  in  V,  we  require  that  tp'p\ S'  C  ipp\S' . 

Thus,  the  synthesis  problem  is  as  follows  (This  definition  will  be  instantiated  for  (soft  and  hard)  failsafe,  non¬ 
masking,  and  (soft  and  hard)  masking  /-tolerance): 

Problem  Statement  4.1.  Given  V(SP,  ipp),  S,  X,  and  /  such  that  V  | =5  X. 

Identify  V'  (Sp,^'p)  and  S'  such  that 

(Cl)  S'  c  S 

(C 2)  il>'p\S'  C  ipp\S' ,  and 

(C3)  V  is  /-tolerant  from  S'  for  X.  □ 

5  Adding  Fault-Tolerance  to  Real-Time  Programs 

In  this  section,  we  present  our  synthesis  algorithms  and  NP-hardness  result  for  adding  fault-tolerance  to  an 
existing  real-time  program.  In  particular,  in  Subsection  5.1,  we  describe  our  algorithms  for  adding  (soft  and 
hai'd)  failsafe  fault- tolerance.  In  Subsection  5.2,  we  describe  how  we  add  nonmasking  fault-tolerance.  In  Sub¬ 
section  5.3,  we  describe  automated  addition  of  (soft  and  hai'd)  masking  fault- tolerance.  Finally,  in  Subsection 
5.4,  we  consider  the  case  of  hai'd  masking  fault-tolerance  where  two  or  more  timing  constraints  must  be  met  in 
the  presence  of  faults. 

5.1  Automated  Addition  of  Failsafe  Fault- Tolerance  to  Real-Time  Programs 

In  this  subsection,  we  first  present  our  algorithm  for  adding  soft-failsafe  fault-tolerance.  Then,  we  describe  our 
algorithm  for  adding  hard-failsafe,  where  the  synthesized  program  is  required  to  satisfy  at  most  one  bounded 
response  property  in  the  presence  of  faults.  As  mentioned  in  Subsection  2.2,  the  safety  specification  identifies  a 
set  X/,/  of  bad  transitions  that  should  not  occur  in  any  program  computation,  and  a  conjunction  X  irr  of  multiple 
bounded  response  properties.  Also,  recall  that  in  the  presence  of  faults,  a  soft-failsafe  program  is  required  to 
maintain  only  X^,  whereas  a  hard-failsafe  program  should  maintain  both  X^  and  X/„.. 
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5.1.1  Adding  Soft-Failsafe  Fault-Tolerance 

In  order  to  synthesize  a  soft-failsafe  program,  we  should  generate  a  program  V ,  such  that  a  bad  transition 
(<to ,  or )  G  Sfet  does  not  occur  in  any  computation  of  V  in  the  presence  of  faults.  Towards  this  end,  we  adapt 
the  proposed  algorithm  in  [6]  that  adds  failsafe  fault- tolerance  to  untimed  programs. 

We  now  describe  our  algorithm  Ad  d  _S  oft  Fa  I  i  safe  (cf.  Figure  1)  in  detail.  First,  we  transform  the  real-time 
program  P(Sp,  tpp),  invariant  S,  a  set  /  of  fault  transitions,  and  a  set  of  bad  transitions  into  a  region  graph 
R(V)(Sp,ipp),  region  invariant  S'\  fault  edges  fr,  and  bad  edges  (Line  Al).  This  is  achieved  by  invoking 
the  subroutine  ConstructRegionGraph,  as  described  in  Subsection  2.3.  Then,  the  algorithm  adds  failsafe 
fault-tolerance  to  R(V),  so  that  no  edge  of  T,J)f  occurs  in  computations  of  R(V).  This  is  achieved  by  invoking 
the  subroutine  Add_Untimed Failsafe  (Line  A2). 

The  subroutine  AddJJntimedFailsafe  (cf.  Figure  1)  first  finds  the  set  ms  of  regions  and  the  set  mt  of 
edges  from  where  safety  of  V  may  be  violated  by  faults  alone  (lines  Cl,  C2).  Next,  it  removes  such  regions 
(respectively,  edges)  from  the  region  invariant  Sr  (respectively,  set  of  edges  pp)  of  R(V).  This  removal  may 
create  deadlock  regions.  Hence,  next,  the  subroutine  removes  deadlock  regions  from  Sr  (Line  C3),  ensures 
closure  of  Sr  in  FT  (Line  C5),  and  returns  a  failsafe  region  graph  R(V'){Sp ,  ipp)  (Line  C6). 

Finally,  The  algorithm  Add_SoftFalisafe  transforms  the  region  graph  P(P')  back  into  a  real-time  program 
V  (Line  A3). 

Theorem  5.1.  The  algorithm  Add_SoftFalisafe  is  sound  and  complete. 

For  reasons  of  space,  we  refer  the  reader  to  [29]  for  proofs  of  all  theorems  in  this  paper..  □ 

Theorem  5.2.  The  problem  of  adding  soft-failsafe  fault-tolerance  to  a  real-time  program  is  in  PSPACE.  □ 

5.1.2  Adding  Hard-Failsafe  Fault-Tolerance  with  a  Single  Bounded  Response  Property 

In  this  subsection,  we  consider  the  case  that  a  hard-failsafe  fault-tolerant  program  is  required  to  satisfy  at  most 
one  bounded  response  property  in  the  presence  of  faults.  In  other  words,  P},r  =  P  i— ><$  Q.  Towards  this 
end,  we  need  to  generate  a  program  V' ,  such  that  it  maintains  both  and  Pirr  in  the  presence  of  faults.  In 
other  words,  a  bad  transition  (<to,<ti)  €  occurs  in  no  computation  of  P' .  Moreover,  if  a  computation  of 
V'  reaches  a  state  in  P  then  it  reaches  a  state  in  Q  within  5  units  of  time  even  in  the  presence  of  faults.  To 
this  end,  we  first  add  soft-failsafe  fault-tolerance  to  R(P)  to  ensure  that  P’  maintains  in  the  presence  of 
faults.  Then,  we  transform  R(P)  to  an  ordinary  weighted  directed  graph  (called  MaxDelay  digraph).  To  ensure 
that  the  maximum  delay  to  reach  a  state  in  Q  from  each  state  in  P  is  at  most  <5  time  units  in  the  presence  of 
faults,  we  extract  a  subgraph  of  the  MaxDelay  digraph,  such  that  the  longest  distance  between  the  vertices  that 
correspond  to  the  states  in  P  and  Q  is  at  most  5.  Before  we  present  our  algorithm  for  adding  hard-failsafe 
fault-tolerance  in  detail,  we  reiterate  how  to  construct  a  MaxDelay  digraph  from  [30], 

Construction  of  MaxDelay  digraph.  We  now  describe  the  subroutine  ConstructMaxDelayGraph  that 
transforms  a  region  graph  to  a  MaxDelay  digraph.  The  subroutine  takes  a  region  graph  R(V)(S  p,Vp)  and  a 
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Add_SoftFailsafe('P(<S'j),  ipp)  :real-time  program  /  transitions,  S'.  state  predicate,  T,bt'.  specification) 

{ 

R{V)(Srp,iprp),  Sr,  fr,  E£t  :=  ConstructRegionGraph (V(Sp,i>p),  S,  f,  Ew); 
ip'jf,  S'r,  mt  :=  Add_UntimedFailsaf e(R(V)(S;,  iprp),  fr,  Sr ,  E£t); 

V'{SP,  tp'p),  S'  :=  ConstructRealTimeProgram(_R('P)(S'p,  tp'p),  S'r) 


Add_HardFailsafe(7:’{Sp,  %pp)  :real-time  program  f  transitions,  S,  P,  Q'.  state  predicate,  T,bt'.  specification,  n,  5:  integer) 

{ 

R(P){Sp,  1%),  Sr,  Pr,  Qr,  fr,  Zrbt  :=  ConstructRegionGraph (V(SP,^P),  S,  P ,  Q,  /,  Ew); 
iprp,  Sr,  mt  :=  Add.UntimedFailsafe (R(V){SP,  iprp),  fr,  Sr ,  E rbt); 

repeat 

lsQRemoved  := false', 

Vv  ~  Vp  U  {((s0,Po),(si,pi))  I  (s0,p0)  <£ST  A 

3p2  |  p2  is  a  time-successor  of  po  :  (3A  C  X  :  pi  =  p2[A  :=  0])}; 
ipp,ns  :=  Add_BoundedRecovery(7i(7:,){Sp  —  ms,ipp  —  mt) ,  fr ,  Pr ,  Qr ,n,  5); 
rs  :=  {r0  |  3n,  r2,  —rn  ■  (Vj  :  0  <  j  <  n  :  {rj,rj+ 1)  €  fr )  A  rn  G  ns  A  rn  G  Pr}', 
rt  \=  {(r0,n)  |  (ro,ri)  G  tpPl  A  n  G  rs)}; 

S'r  :=  RemoveDeadlocks(5'r  —  (ns  U  rs),  Qr ,  ipp  —  rt)', 

if  (S,r  =  {})  then  declare  no  hard-failsafe  /-tolerant  program  V'  exists;  exit; 
if  (3r0  €  Qr  :  (r0  €  Sr  A  r0  0  S,r))  then 
lsQRemoved  :=  true', 

Sr  :=  5"r; 

Qr  :=  Qr  -  {r0}; 

ipp  :=  ip p  -  {(r,r0),(r0,r)  |  r  G  Sr}; 
until  (lsQRemoved  =  false) 

tpp  :=  EnsureClosure (tpp  ,  S'r)', 

V'  (Sp,  ipp),  S'  :=  ConstructRealTimeProgram(_R('P)(S'p,  lpp),  S'r) 


Add_UntimedFailsafe(_R('P)(S'p,  ipp):  region  graph,  fr  :  set  of  edges,  Sr  :  region  predicate,  E£f  :  specification) 

{ 

ms  '.=  (r0  |  3n,r2,...r„  :  (V/  |  0 <j<n:  (r^r^i)  G  fr)  A  (r„_i,r„)  G  E£(  }; 
mt  :=  {(r0,ri)  |  (nGms)  V  ((r0,n)  GEJ,)}; 

Sr  :=  RemoveDeadlocks)^  —  ms,  {},  ipp  —  mt); 

if  (Sr  =  {})  then  declare  no  soft/hard-failsafe  /-tolerant  program  V'  exists;exit; 
ipp  :=EnsureClosure(t/)p  —  mt,  Sr); 
return  ipp,Sr,  mt 


RemoveDeadlocks(Sr,  Qr  :  region  predicate,  i pp  :  set  of  edges) 

//  Returns  the  largest  subset  of  Sr  from  where  all  computations  of  R(V)  are  infinite 

{ 

while  (3r0  |  r0£Sr  :  (Vn  G  Sr  :  (ro,ri)£ipp)) 
sr  ■■=  Sr  -  {ro}; 

if  (ro  G  Qr)  then  break; 
return  Sr 

} 

EnsureClosure(t/)p  :  set  of  edges,  Sr  :  region  predicate) 

{  return  tpp  —  {{r0,  n)  |  r0 &Sr  A  n  pL  Sr}} 
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set  fr  of  fault  edges  as  input,  and  constructs  a  MaxDelay  digraph  G(V,  A)  as  follows.  Vertices  of  G  consists 
of  the  regions  in  R(V). 

Notation :  We  denote  the  weight  of  an  arc  (v(h  v\ )  by  Weight (vo,v\).  Let  7  denote  a  bijective  function  that 
maps  each  region  r  G  S’p  to  its  corresponding  vertex  in  G\  i.e.,  7(r)  is  a  vertex  of  G  that  represents  region  r  of 
R(V).  Also,  let  7_1  denote  the  inverse  of  7;  i.e.,  7-1(n)  is  the  region  of  R(P')  that  coiTesponds  to  vertex  v  in 
V.  Let  T  be  a  function  that  maps  a  region  predicate  in  R(V)  to  the  corresponding  set  of  vertices  of  G  and  let 
L  - 1  be  its  inverse.  Finally,  for  a  boundary  region  r  with  respect  to  clock  variable  x,  we  denote  the  value  of  x 
by  r.x  (equal  to  some  constant  in  Z>o). 

Arcs  of  G  consists  of  the  following: 

•  Arcs  of  weight  0  from  vq  to  v\,  if  7_1(^o)  - >  7_1(^i)  represents  a  jump  transition  in  Il('P). 

•  Arcs  of  weight  c!  —  c,  where  c,  c'  G  Z> 0  and  d  >  c,  from  vq  to  v\,  if  7_1(no)  and  7  1  ( t;  1 )  are  both 
boundary  regions  with  respect  to  clock  variable  27,  such  that  7_1  (vo).xt  =  c,  7"  1  (vo  1 ) ..X7;  =  d,  and  there 
is  a  path  in  R(V)  from  7~1(no)  to  7_1(ni),  which  does  not  reset  27. 

•  Arcs  of  weight  d  —  c  —  e,  where  c,d  G  Z>0,  d  >  c,  and  t  «  1,  from  vo  to  v\  ,  if  (1)  7-1(no)  is  a 
boundary  region  with  respect  to  clock  27,  (2)  7'  1  (17 )  is  an  open  region  whose  time-successor  7” 1  (222) 
is  a  boundary  region  with  respect  to  clock  27,  (3)  7"  '(t’o)  — >  7~ 1  (17 )  represents  a  delay  transition  in 
R(P ),  and  (4)  7_1(fo ).Xi  =  c  and  r)~1{y2)-Xi  =  d . 

•  Self-loop  arcs  of  weight  00  at  vertex  v,  if  7-1  (v)  is  an  end  region. 

In  order  to  compute  the  maximum  delay  between  regions  in  Pr  and  Q'\  it  suffices  to  find  the  longest  distance 
between  L ( Pr )  and  L(Qr)  in  G.  Note  that,  strongly  connected  components  reachable  from  r(Pr)  containing 
an  arc  of  nonzero  weight  cause  maximum  delay  of  infinity. 

We  now  describe  our  algorithm  AdcLHard Failsafe  (cf.  Figure  1)  in  detail.  The  algorithm  takes  a  real-time 
program  V  with  invariant  S,  a  set  of  fault  transitions  /,  a  set  of  bad  transitions  a  bounded  response  property 
V/)r  =  P  1 — r<(5  Q,  the  maximum  number  of  occurrence  of  faults  n  (cf.  Assumption  3.3),  and  returns  a  hard- 
failsafe  program  P'(Sp,  ip'p)  with  invariant  S'.  First,  we  transform  V  into  its  region  graph  R(P )  (Line  Bl).  Let 
Pr  and  Qr  be  region  predicates  with  respect  to  state  predicates  P  and  Q,  respectively.  Then,  to  ensure  that  P' 
maintains  we  add  soft-failsafe  fault-tolerance  to  R(P)  (Line  B2).  Next,  we  modify  R(P),  such  that  any 
computation  that  starts  from  a  region  in  Pr,  reaches  a  region  in  Q'  in  at  most  5  time  units  even  in  the  presence 
of  faults.  Towards  this  end,  we  compute  the  set  of  regions  and  edges  from  where  V },,,  is  maintained  (lines 
B3-B14).  In  particular,  to  ensure  that  Q  is  reachable  from  the  states  in  P  A  ~<S,  we  add  edges  that  start  from 
each  region  in  Sp  —  Sr  and  go  to  regions  where  the  time  monotonicity  condition  is  preserved  (Line  B4).  Now, 
we  invoke  the  subroutine  Add_BoundedRecovery  to  ensure  that  P  ^-><s  Q  is  maintained  in  the  presence  of 
faults  (Line  B5). 
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Add_BoundedRecovery  (R(V)(Sp,ipp):  region  graph,  fr:  set  of  edges,  Pr ,  Qr:  region  predicate,  n,  5:  integer) 

//  Adds  bounded-time  recovery  from  Pr  to  Qr  in  the  presence  of  fr 

{ 

G{V,A)  :=  ConstructMaxDelayGraph(7?(7:,)(S'p,  /r);  (Dl) 

//  Let  G  l  (V1 ,  A1)  be  the  portion  of  G,  in  which  (n  —  i)  faults  have  occurred,  where  0  <  i  <  n 

for  each  vertex  v  G  V°  :  Rank(v)  :=  Length  of  the  shortest  path  from  v  to  a  vertex  in  r(Qr)°;  (D2) 

for  i  =  1  to  n  (D3) 

for  each  vertex  vo  G  V1  :  (D4) 

Vf  :=  {V!  |  (Vl  G  F'-1  A  (7“1(vo),7“1(vi))  £  Dh  (D5) 

if  Vf  {}  then  (D6) 

MinRank(v  o)  :=  ma,x{(Rank(vi)  +  Weight  (vo,  Vi))  for  all  vi  G  V/};  (D7) 

else  MinRank(vo)  :=  0;  (D8) 

AdjustShortestPaths(Gi(Fi,  Ai),r(Pr)i,r(Qr)i);  (D9) 

//  Constructing  a  subgraph  of  each  portion  such  that  the  longest  distance  between  T(Pr)  and  F(Qr)  is  at  most  8 
and  then  adding  the  arcs  and  vertices  that  do  not  appear  on  paths  from  T  (Pr)  to  T(Qr) 
for  i  =  0  to  n  (DIO) 

G'iiy'\A'i)  =  {}\  (Dll) 

for  each  vertex  v  G  F (Pr)x  :  (D12) 

if  Rank(v)  <  8  then  (D13) 

Ft  :=  the  shortest  path  from  v  to  a  vertex  in  T (Qr)1',  (D14) 

V'i  ■-  vH  U  {u  |  u  is  on  n};  (D15) 

Ah  ■*==  Ah  U  {a  \  a  is  on  n};  (D16) 

Ah  :=  An  U  {(u,  v)  |  (u,  v)  G  A{  A  (u  f  VH  V(tiG  r(Qr)i))};  (D17) 

Vn  :=  (VH  U  {u  |  (3v  :  (u,  v )  G  AH  V  (v,  u )  G  A'i)})\  (D18) 

//  Transforming  weighted  digraph  G  into  a  region  graph 
tpp  ■=  {(fo,n)  |  (ro,rr)  Gt/iJ  A  (7(^0),  j(ri))  G  A'}  U 

{(ri,r2)  |  (ri,r2)  G^J  A  (7(^1),  7(^2))  £  A!  A  3r0  :  Weight(^{r0),  7(ri))  =  1  -  e};  (D19) 

ns  :=  {r  |  7(r)  G  V  —  V '};  (D20) 

return  ftp  ,  ns  (D21) 


} 

AdjustShortestPaths(G1(l/1,  A1)  :  directed  weighted  graph,  Vp,  Vq :  set  of  vertices) 

II  Adjusts  the  rank  of  each  vertex  based  on  the  ranks  computed  in  AddJioundedRecovery 
{ 

for  each  vertex  v  G  Vp  apply  Dijkstra’s  shortest  path  with  the  following  change: 

if  Dijkstra’s  shortest  path  computes  a  length  less  than  MinRank(v)  then 

Rank(v)  :=  MinRank(v)\  (D22) 

else  Rank(v)  :=  length  of  Dijkstra’s  shortest  path  from  v  to  Vq  (D23) 

} 


Figure  2:  Addition  of  Bounded-Time  Recovery  in  the  Presence  of  Faults 

The  subroutine  AdcLBoundedRecovery  (cf.  Figure  2)  adds  bounded-time  recovery  to  a  given  region 
graph  as  follows.  First,  it  transforms  the  given  region  graph  R(V)  to  a  MaxDelay  digraph  G(V,  A)  (Line  Dl). 
Recall  that,  by  Assumption  3.2,  faults  are  detectable  and  V  already  has  a  variable  that  shows  how  many  faults 
have  occurred  in  a  computation.  Thus,  let  CP  (V'\  A')  be  the  portion  of  G,  in  which  n  —  i  faults  have  occurred, 
where  0  <  i  <  n.  More  specifically,  initially,  a  computation  starts  from  portion  G"\  where  no  faults  have 
occurred.  If  a  fault  occurs  in  a  computation  that  is  currently  in  portion  6",  the  computation  will  proceed  in 
portion  G"_1.  Obviously,  if  n  faults  occur  then  the  computation  proceeds  in  portion  G'°  and  no  faults  will  occur 
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Figure  3:  Adjusted  Shortest  Path. 


in  that  computation.  We  use  these  portions  to  see  whether  it  is  possible  to  reach  a  vertex  in  Y  ( Q r )  from  each 
vertex  in  Y(Pr)  within  S  time  units. 

Next,  we  rank  vertices  of  all  portions  of  G  using  a  modified  Dijkstra’s  shortest  path  algorithm,  which  takes 
fault  perturbations  into  account  (lines  D2-D9  and  D22-D23).  More  specifically,  since  no  faults  occur  in  G°, 
we  first  let  the  rank  of  all  vertices  v  €  V°  be  the  length  of  Dijkstra’s  shortest  path  from  v  to  a  vertex  in 
r(Qr)°  (Line  D2).  Now,  let  vq  be  a  vertex  in  V1,  where  1  <  i  <  n,  and  let  v\  be  a  vertex  in  V'~  ',  such  that 
(7-1(vo),7_1(?;i))  is  a  fault  edge  in  R(V)  and  both  vo  and  v\  are  on  a  path  from  T(Pr)  to  r(<3r).  There 
exist  two  cases:  (1)  the  fault  edge  (7_1(^o),7_1(^i))  decreases  or  does  not  change  the  computation  delay,  i.e, 
the  shortest  distance  from  v\  to  a  vertex  in  T{Qr)'l~ 1  is  less  than  or  equal  to  the  shortest  distance  from  vo  to  a 
vertex  in  r(Qr)\  and  (2)  the  fault  edge  (7"  '(To), 7  1  ( v;  1 ) )  increases  the  computation  delay,  i.e.,  the  shortest 
distance  from  v \  to  a  vertex  in  Y(Qry  1  is  greater  than  the  shortest  distance  from  vo  to  a  vertex  in  Y(Qr)1  (cf. 
Figure  3).  While  the  former  case  does  not  cause  violation  of  E },,,  in  the  presence  of  faults,  the  later  may  do. 
Hence,  the  rank  of  vq  must  be  at  least  the  rank  of  v\  in  Vl~ 1 .  Moreover,  if  there  exist  multiple  fault  edges  at 
7“ 1  ('()())  then  we  take  the  maximum  rank  (Line  D7).  After  computing  the  rank  of  vertices  from  where  faults 
may  occur,  we  adjust  the  rank  of  the  rest  of  vertices  from  where  faults  do  not  occur  by  invoking  the  subroutine 
AdjustShortestPath  (Line  D9).  Figure  3  illustrates  how  vertex  rankings  work. 

Now,  for  each  portion  Gl,  we  construct  a  subgraph  of  6"  whose  longest  distance  from  each  vertex  in  Y(Pfy 
to  a  vertex  in  F(Qr)"  is  at  most  5  as  follows  (lines  D11-D16).  To  this  end,  we  begin  with  an  empty  digraph 
GH(VH,  A'1)  and  we  first  include  shortest  paths  from  each  vertex  v  €  Y(Pry  to  a  vertex  in  Y(Qr)\  provided 
Rank(v)  <  6  (lines  D1 3-D  16).  Note  that,  adding  such  shortest  paths  does  not  create  new  paths  of  length  greater 
than  5.  Next,  we  include  the  remaining  arcs  and  vertices  in  Gh,  so  that  no  arcs  of  the  form  (v(h  77  ),  where  vq  is 
on  a  path  from  r ( P'  )1  to  Y(Qr  )1  are  added  (lines  D17,  D18). 

Now,  we  transform  the  digraph  G'  back  into  a  region  graph  (Line  D19).  Finally,  we  return  the  set  i/A  of 
edges  from  where  may  not  be  violated  even  in  the  presence  of  faults,  and  the  set  ns  of  regions  from  where 
Sf,r  may  be  violated  in  the  presence  of  faults  (lines  D20,  D21). 

After  adding  bounded-time  recovery,  the  algorithm  Add _Hard Failsafe  first  identifies  the  set  rs  of  regions 
and  the  set  rt  of  edges  from  where  faults  alone  may  violate  L (lines  B6,  B7  in  Figure  1).  Then,  it  removes 
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such  regions  and  edges  along  with  the  deadlock  regions  from  Sr  (due  to  pruning  some  vertices  and  arcs  in 
step  B5)  in  the  same  fashion  that  we  did  for  adding  soft-failsafe  fault-tolerance  (Line  B8).  However,  while 
removing  deadlock  regions,  we  need  to  consider  a  special  case  where  a  region  r o  €  Qr  becomes  a  deadlock 
region.  In  this  case,  it  is  possible  that  all  the  regions  along  the  paths  that  start  from  a  region  in  P '  and  end 
in  ro  become  deadlock  regions.  Hence,  we  need  to  find  another  path  from  the  region  in  Pr  to  a  region  in  Qr 
other  than  r 0.  Hence,  in  this  case,  we  remove  r0  from  Sr  and  Qr  and  staid  over  (lines  B10-B14).  Finally,  the 
algorithm  ensures  closure  of  the  invariant  (Line  B15)  and  transforms  the  synthesized  region  graph  R(V')  back 
to  a  real-time  program  V  (Line  B16). 

Theorem  5.3.  The  algorithm  AdcLHardFalisafe  is  sound  and  complete.  □ 

Theorem  5.4.  The  problem  of  adding  hard-failsafe  fault-tolerance  to  a  real-time  program,  where  the  syn¬ 
thesized  program  is  required  to  satisfy  at  most  one  bounded  response  property  in  the  presence  of  faults,  is  in 
PSPACE.  □ 

5.2  Automated  Addition  of  Nonmasking  Tolerance  to  Real-Time  Programs 

To  derive  a  nonmasking  /-tolerant  program  V ,  we  ensure  that  if  the  state  of  V  is  perturbed  by  faults  in  / 
then  it  recovers  to  a  state  in  S  within  a  pre-specified  recovery  time  5.  Since  a  nonmasking  program  is  not 
required  to  satisfy  its  safety  specification  in  the  presence  of  faults,  to  provide  bounded-time  recovery,  it  suffices 
to  invoke  the  subroutine  Add_BoundedRecovery  for  state  predicates  Sp  —  S  and  S.  Since  an  algorithm  for 
adding  nonmasking  fault-tolerance  is  very  simple  and,  in  Subsection  5.3,  we  describe  how  we  add  bounded¬ 
time  recovery  from  fault-span  to  invariant,  we  do  not  present  the  algorithm  in  a  formal  fashion. 

5.3  Automated  Addition  of  Masking  Tolerance  to  Real-Time  Programs 

As  mentioned  in  Section  3,  in  masking  fault-tolerance  the  program  is  required  to  satisfy  its  safety  specification 
in  the  presence  of  faults  and  if  the  state  of  a  program  is  perturbed  by  faults  then  it  recovers  to  its  invariant  within 
a  bounded  amount  of  time.  In  Subsection  5.3.1,  we  present  our  synthesis  algorithm  for  adding  soft-masking 
fault-tolerance  to  an  existing  real-time  program.  Then,  in  Subsection  5.3.2,  we  discuss  the  issues  in  addition 
of  hard-masking  fault-tolerance,  where  the  synthesized  program  is  required  to  satisfy  at  most  one  bounded 
response  property  in  the  presence  of  faults. 

5.3.1  Adding  Soft-Masking  Fault-Tolerance 

In  order  to  synthesize  a  soft-masking  program,  we  should  generate  a  program  V  with  invariant  S'  and  fault- 
span  T',  such  that  it  never  violates  its  safety  specification  and  if  a  fault  perturbs  the  state  of  a  program  to  a  state 
in  T',  it  recovers  to  S'  within  a  pre-specified  recovery  time  <5.  To  this  end,  we  extend  the  algorithm  proposed 
in  [6]  for  adding  masking  fault-tolerance  to  untimed  programs,  such  that  it  provides  bounded  time  recovery. 

Now,  we  describe  the  algorithm  AdcLSoftMasking  (cf.  Figure  4)  in  detail.  First,  we  construct  the  region 
graph  R(V)  (Line  El).  Our  first  estimate  of  a  soft-masking  program  is  a  soft-failsafe  program.  Hence,  we  let 
our  first  estimate  ,S'[  be  the  region  invariant  of  its  soft-failsafe  fault-tolerant  program.  Likewise,  we  estimate 
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Add_SoftMasking(P(S'p,  ipp)  :real-time  program  /  transitions,  S:  state  predicate,  E; ,t:  specification,  n,  8:  integer) 


{ 

R{V){Sp,ipp),  Sr,  fr,  E£t  :=  ConstructRegionGraphf'P (Sp,ipp),  S,  /,  Ew);  (El) 

Define  ms  and  mt  as  in  Add_UntimedFailsafe.  (E2) 

S{,T{  :=  RemoveDeadlocks(S'r  —  ms,  ipp  —  mt),  Sp  —ms;  (E3) 

repeat  (E4) 

TLSr2  :=T{,Sh  (E5) 

rpi  -=rP\Sl  u  {((s0,po),  (Si,pi))  |  (s0,po)  €  T[  A  (s0,Po)£Sf  A  (si,pi)e7TA 

3p2  |  P2  is  a  time-successor  of  po  :  (3A  C  X  :  pi  =  P2[A  :=  0])}  —  mt;  (E6) 
T{  :=  ConstructFaultSpan(2T  —  {r  |  is  not  reachable  from  r  using  tppi  },  /r);  (E7) 

Si  :=  RemoveDeadlocks(S[  A  Ti,tppi)-,  (E8) 

if  (S[  =  {}  V  T/  =  {})  then  (E9) 

declare  no  soft-masking  /-tolerant  program  V'  exists;  exit;  (E10) 

until  {T[=n  A  Sr  =  SJ);  (Ell) 

ippr,ns  :=  Add_BoundedRecovery (7?((P) (Sp ,  ippi ) ,  fr,  T{  -  SI,  SI,  n,  5);  (E12) 

rs  :=  {r0  \  3n ,r2,  ...rn  :  (V/  :  0  <  /  <  n  :  (rj,rj+ 1)  G  fr)  A  rn  G  ns};  (E13) 

rt  :=  {(r0,ri)  |  (r0,n)  G  ippi  A  n  G  rs)};  (E14) 

Si  :=  RemoveDeadlocks(S}  —  rs,ippi  —  rt);  (E15) 

if  (Si  =  {})  then  declare  no  soft-masking  /-tolerant  program  V'  exists;  exit;  (E16) 

else  ipp  :=  EnsureClosure}!/^  —  rt,  S}  —  rs);  (E17) 

S'r,T'r  ■-  sr  -  rs,  T[  -  ns-  (E18) 

r'(Sp,  ip'p),  S',  T'  :=  ConstructRealTimeProgram(i7(P,)<S';,  ipp),  S'r  ,T'r)  (E19) 


} 

ConstructFaultSpan(Tr  :  region  predicate,  fr  :  set  of  edges) 
//  Returns  the  largest  subset  of  Tr  that  is  closed  in  fr. 

{ 

while  pro,  n  :  ro  GTr  A  r\  qLTr  A  (ro,  ri)  G/r) 

Tr  -.=  Tr  -  {r0}; 

return  Tr 

} 


Figure  4:  Addition  of  Soft-Masking  Fault-Tolerance 

T'r  to  be  T[  where  T[  includes  all  the  regions  in  the  region  space  minus  the  regions  from  where  safety  of 
R(V)  may  be  violated  (lines  E2,  E3).  Next,  we  compute  the  set  of  edges  iprpi,  region  fault-span  T[,  and  region 
invariant  S[  of  R(V)  in  a  loop  as  follows  (lines  E4-E11): 

1.  In  order  to  compute  the  set  of  edges  ippi,  we  first  include  edges  in  FT|,3’}.  Then  we  consider  edges  that 
start  from  a  region  (so,po)>  where  (so,po)  F  T[  —  S[,  and  end  at  a  region  p] ,  f.>\  )  G  T[  (by  closure 
of  fault-span),  such  that  the  time  monotonicity  condition  is  preserved,  i.e.,  there  exists  p 2,  where  p2  is  a 
time-successor  of  po  and  p\  =  p->  [A  ;=  0],  such  that  A  is  any  subset  of  the  set  X  of  clock  variables  of  V. 
Finally,  we  remove  the  transitions  mt  from  this  set  (Line  E6). 

2.  We  recompute  the  region  fault-span  by  first  removing  the  regions  from  where  ,S'[  is  not  reachable  using 
the  edges  in  'ipf[n .  Then,  we  remove  regions  from  where  closure  of  fault-span  may  be  violated  through  a 
fault  edge,  by  invoking  the  subroutine  ConstructFaultspan  (Line  E7). 
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3.  Since  S[  must  be  a  subset  of  T[,  we  recompute  the  region  invariant  by  invoking  the  subroutine  Re- 
moveDeadlocks  for  S\  A  T[  (Line  E8)  and  jump  back  to  step  1. 

Upon  the  termination  of  the  repeat-until  loop,  recovery  from  T[  to  ,S’[  is  provided,  but  not  in  5  time  units. 
Hence,  we  need  to  ensure  that  any  computation  of  the  soft-masking  program  V'(Sp ,  Up  that  starts  from  a  state 
in  the  fault-span  T',  reaches  its  invariant  S'  within  5  time  units,  even  in  the  presence  of  faults.  In  fact,  we 
need  bounded-time  recovery  from  each  state  in  T'  —  S'  to  a  state  S',  which  is  in  turn  the  bounded  response 
property  (T  —  S )  i— ><$  S.  To  this  end,  we  invoke  the  subroutine  AdcLBoundedRecovery  with  parameters 
R(V)(Sp,'ip‘!pi),  fr  —  S[,S[,n,  and  S  (Line  E12).  Since  ,5’[  is  closed  in  Ajj, ,  unlike  adding  h  aid- fail  safe, 
we  do  not  need  to  worry  about  removal  of  regions  in  S[.  However,  if  there  exists  a  region  ro  €  ,3’[  that  may 
reach  a  region  r\  £  ns  by  taking  faults  alone,  where  ns  is  the  set  of  regions  from  where  recovery  from  T[  is 
not  possible,  ?’o  becomes  a  region  from  where  a  program  computation  goes  to  the  fault-span,  but  cannot  recover 
to  the  invariant  in  5  time  units.  Hence,  we  remove  the  regions  (respectively,  edges),  from  where  by  taking 
faults  alone  a  computation  may  reach  a  region  in  ns,  from  S±  (respectively,  Up  )  (lines  E13-E17).  Finally,  we 
construct  the  real-time  program  V'(SP,  ip')  with  invariant  S'  out  of  its  region  graph  R(V'){Sp,  ip'*)  and  region 
invariant  S'r  (lines  E18,  E19). 

Theorem  5.5.  The  algorithm  Add_SoftMasking  is  sound  and  complete.  □ 

Theorem  5.6.  The  problem  of  adding  soft-masking  fault-tolerance  to  a  real-time  program  is  in  PSPACE.  □ 
5.3.2  Adding  Hard-Masking  Fault-Tolerance  with  a  Single  Bounded  Response  Property 

To  design  a  hard-masking  fault-tolerant  program  V'  from  an  intolerant  program  V  for  the  case  where  L/)r  = 
P  |— ^<(5  Q-  we  ensure  that  V  is  soft-masking  fault-tolerant  and  it  maintains  P  *-><5  Q  even  in  the  presence  of 
faults.  Note  that,  since  V  is  supposed  to  be  a  soft-masking  program,  it  must  provide  bounded-time  recovery, 
which  is  in  turn  the  bounded  response  property  (T  —  S )  1 — >-<5/  S.  In  other  words,  V  must  satisfy  two  bounded 
response  properties  simultaneously.  A  possible  solution  seems  to  be  adding  the  bounded  response  properties 
one  after  another.  Note,  however,  that  during  the  addition  of  the  first  property,  we  may  unnecessarily  remove  a 
transition  that  should  have  been  kept  in  order  to  be  able  to  add  the  second  property.  Hence,  such  a  solutions  is 
sound  but  not  complete. 

In  this  context,  we  note  that  in  [31],  the  authors  show  that  adding  two  unbounded  response  (leads-to) 
properties  to  an  untimed  program  is  NP-hard  in  the  state  space.  While  there  are  subtle  differences  between 
the  problem  considered  in  [31]  and  the  problem  of  adding  hard- masking  (e.g.,  P,Q  C  T),  based  on  [31],  we 
conjecture  that  the  time  complexity  of  adding  hard-masking  fault-tolerance  even  with  a  single  bounded  response 
property  is  exponential  in  the  size  of  the  region  graph. 

5.4  Adding  Hard  Masking  Fault- Tolerance  with  Multiple  Bounded  Response  Properties 

In  this  subsection,  we  note  that  if  L/)r  consists  of  multiple  bounded  response  properties  then  adding  hard  mask¬ 
ing  fault-tolerance  to  a  real-time  program  is  NP-hard. 
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Theorem  5.7.  The  problem  of  adding  hard  masking  fault-tolerance  to  a  real-time  program  where  the  resulting 
program  is  required  to  satisfy  multiple  bounded  response  properties  in  the  presence  of  faults,  is  NP-hard  in  the 
size  of  region  graph. 

While  we  omit  the  proof  of  this  theorem  for  reasons  of  space,  we  note  that  in  [31],  we  have  shown  that 
the  problem  of  adding  two  (unbounded)  response  properties  to  a  given  program  (in  the  absence  of  faults)  is 
NP-hard.  The  same  proof  can  be  extended  to  this  problem,  as  adding  hard  fault-tolerance  requires  that  bounded 
liveness  properties  arc  preserved  in  the  presence  of  faults.  For  reasons  of  space,  we  refer  the  reader  to  [29]  for 
proofs. 

6  Discussion 

In  this  section,  we  justify  our  assumptions  and  effect  of  them  on  our  approach. 

Modeling  safety  specification.  We  choose  to  model  the  untimed  paid  of  safety  specifications  by  a  set  of  bad 
transitions  due  to  the  recent  results  on  time  complexity  of  synthesis  algorithms  that  deal  with  more  general  class 
of  specifications.  In  [32],  Kulkarni  and  Ebnenasir  show  that  the  problem  of  adding  masking  fault-tolerance 
to  untimed  programs,  where  the  safety  specification  is  specified  in  terms  a  set  of  bad  pairs  of  transitions,  is 
NP-hard.  Furthermore,  as  mentioned  in  Subsection  5.4,  if  the  safety  specification  consists  of  multiple  bounded 
response  properties,  the  problem  of  adding  hard  masking  fault-tolerance  is  also  NP-hard  in  size  of  region  graph. 
Therefore,  we  argue  that  automated  synthesis  of  both  soft  and  hard  fault-tolerant  real-time  programs  is  likely 
to  be  more  successful  if  one  focuses  on  problems  where  the  untimed  paid  safety  can  be  represented  by  a  set  of 
bad  transitions.  Moreover,  we  argue  that  automated  synthesis  methods  for  adding  hard  fault-tolerance  is  more 
successful,  if  timing  constraints  of  safety  is  represented  by  at  most  one  bounded  response  property. 

Safety  specification  in  the  absence  and  presence  of  faults.  In  many  systems,  the  safety  requirements  in 
the  presence  of  faults  may  be  weaker  than  that  in  the  absence  of  faults.  In  this  case,  during  synthesis,  one 
should  specify  the  properties  that  should  be  met  in  the  presence  of  faults.  Since  we  begin  with  a  fault-intolerant 
program  that  meets  the  specification  in  the  absence  of  faults  and  no  new  behaviors  arc  added  in  the  absence  of 
faults,  the  fault-tolerant  program  would  continue  to  satisfy  the  stronger  specification  in  the  absence  of  faults. 
Unbounded  number  of  faults.  In  our  work,  for  hard  fault-tolerance  we  assumed  that  the  number  of  fault 
occurrences  in  a  computation  is  bounded.  Note  that  if  the  number  of  faults  arc  unbounded  then  for  most 
interesting  scenarios,  the  synthesis  is  not  feasible.  To  illustrate  this,  observe  that  for  most  faults  considered 
in  practice,  the  occurrence  of  faults  causes  a  delay  in  satisfaction  of  a  bounded  response  property.  Thus, 
if  unbounded  number  of  faults  occur  then  hard  fault-tolerance  cannot  be  satisfied  unless  we  ensure  that  the 
program  does  not  reach  states  where  faults  cannot  occur. 

State  space  explosion  problem  .  Region  graph  is  usually  not  the  most  efficient  finite  representation  of 
a  real-time  program.  By  contrast,  zone  automata  [33]  is  considered  as  a  more  efficient  model  used  in  model 
checking  techniques.  In  this  paper,  since  our  goal  was  to  investigate  the  possibility  of  synthesizing  fault-tolerant 
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real-time  programs  and  to  evaluate  the  classes  of  complexity  of  such  algorithms,  we  focused  on  detailed  region 
graphs.  However,  an  interesting  improvement  step  is  modifying  the  algorithms  presented  in  Section  5,  so  that 
manipulate  a  zone  automaton  rather  than  a  region  graph  during  synthesis. 

7  Conclusion  and  Future  Work 

In  this  paper,  we  focused  on  the  problem  of  automatic  addition  of  fault-tolerance  to  real-time  programs.  We 
considered  three  levels  of  fault-tolerance,  failsafe,  nonmasking,  and  masking.  For  failsafe  and  masking,  we 
proposed  two  cases,  soft  and  hard,  based  on  satisfaction  of  timing  constraints  in  the  presence  of  faults.  In  our 
approach,  we  begin  with  an  existing  program  rather  than  specification  and,  hence,  the  previous  efforts  made  for 
synthesizing  the  input  program  arc  reused. 

We  first  introduced  a  generic  framework  to  formally  define  the  notions  of  faults  and  fault-tolerance  in  the 
context  of  real-time  programs.  Then,  we  presented  sound  and  complete  algorithms  for  transforming  fault- 
intolerant  real-time  programs  into  soft-failsafe,  nonmasking,  and  soft-masking  fault-tolerant  programs.  We 
also  proposed  a  sound  and  complete  algorithm  that  synthesizes  hard-failsafe  fault-tolerant  real-time  programs, 
where  the  fault-tolerant  program  is  required  to  satisfy  at  most  one  bounded  response  property  in  the  presence  of 
faults.  The  complexity  of  our  algorithms  arc  in  polynomial  time  in  the  size  region  graphs.  We  also  showed  that 
the  problem  of  adding  hard  masking  fault-tolerance  to  real-time  programs,  where  the  fault-tolerant  program  is 
required  to  satisfy  multiple  bounded  response  properties  in  the  presence  of  faults,  is  NP-hard.  Thus,  this  work 
characterizes  classes  of  problems  where  adding  fault-tolerance  to  real-time  programs  is  expected  to  be  feasible 
and  where  the  complexity  is  too  high. 

Since  the  complexity  of  the  aforementioned  algorithms  is  comparable  to  that  of  existing  model  checking 
techniques,  we  believe  that  the  proposed  algorithms  can  be  used  in  tools  for  synthesizing  fault-tolerant  real-time 
programs  in  practice.  More  specifically,  as  future  work,  we  plan  to  extend  our  tool  FTSyn  2  (which  is  currently 
capable  to  synthesize  fault-tolerant  untimed  programs),  so  that  it  synthesizes  fault-tolerant  real-time  programs 
as  well. 
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